For the water quality analysis task, I will be using a dataset that contains data on all of the major factors that affect the potability of water. All of the factors that affect water quality are very important, so we need to briefly explore each feature of this dataset. Dataset: https://raw.githubusercontent.com/amankharwal/Website-data/master/water_potability.csv

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
In [2]:
data=pd.read_csv(r"C:\Users\$$$\Downloads\water_quality.csv")
In [3]:
data.head()
Out[3]:
ph Hardness Solids Chloramines Sulfate Conductivity Organic_carbon Trihalomethanes Turbidity Potability
0 NaN 204.890455 20791.318981 7.300212 368.516441 564.308654 10.379783 86.990970 2.963135 0
1 3.716080 129.422921 18630.057858 6.635246 NaN 592.885359 15.180013 56.329076 4.500656 0
2 8.099124 224.236259 19909.541732 9.275884 NaN 418.606213 16.868637 66.420093 3.055934 0
3 8.316766 214.373394 22018.417441 8.059332 356.886136 363.266516 18.436524 100.341674 4.628771 0
4 9.092223 181.101509 17978.986339 6.546600 310.135738 398.410813 11.558279 31.997993 4.075075 0
In [4]:
data.isnull().sum()
Out[4]:
ph                 491
Hardness             0
Solids               0
Chloramines          0
Sulfate            781
Conductivity         0
Organic_carbon       0
Trihalomethanes    162
Turbidity            0
Potability           0
dtype: int64
In [5]:
data=data.dropna()
In [6]:
data.isnull().sum()
Out[6]:
ph                 0
Hardness           0
Solids             0
Chloramines        0
Sulfate            0
Conductivity       0
Organic_carbon     0
Trihalomethanes    0
Turbidity          0
Potability         0
dtype: int64
In [7]:
plt.figure(figsize=(8, 5))
sns.countplot(x="Potability",data=data)
plt.title("Distrinution of Unsafe(0) and safe(1) Water")
Out[7]:
Text(0.5, 1.0, 'Distrinution of Unsafe(0) and safe(1) Water')
In [8]:
import plotly.express as px

The ph column represents the ph value of the water which is an important factor in evaluating the acid-base balance of the water. Ph should between 6.5-8.5

In [9]:
fig=px.histogram(data,x='ph',
                color='Potability',
                title='factors affecting Water Quality: PH')
fig.show()

The hardness of water usually depends on its source, but water with a hardness of" 120-200 " milligrams is drinkable.

In [10]:
fig=px.histogram(data,x='Hardness',
                color='Potability',
                title='factors affecting Water Quality: Hardness')
fig.show()

All organic and inorganic minerals present in water are called dissolved solids. Water with a very high number of dissolved solids is highly mineralized. Now let’s take a look at the next factor affecting water quality:

In [11]:
fig=px.histogram(data,x='Solids',
                color='Potability',
                title='factors affecting Water Quality: Solids')
fig.show()

Chloramine and chlorine are disinfectants used in public water systems.

In [12]:
fig=px.histogram(data,x='Chloramines',
                color='Potability',
                title='factors affecting Water Quality: Chloramines')
fig.show()

They are substances naturally present in minerals, soil and rocks. Water containing less than 500 milligrams of sulfate is safe to drink.

In [13]:
fig=px.histogram(data,x="Sulfate",
                color='Potability',
                title="factors affecting water Quality: Sulfate")
fig.show()

Water is a good conductor of electricity, but the purest form of water is not a good conductor of electricity. Water with an electrical conductivity of less than 500 is drinkable.

In [14]:
figure = px.histogram(data, x = "Conductivity", 
                      color = "Potability", 
                      title= "Factors Affecting Water Quality: Conductivity")
figure.show()

Organic carbon comes from the breakdown of natural organic materials and synthetic sources. Water containing less than 25 milligrams of organic carbon is considered safe to drink.

In [15]:
figure = px.histogram(data, x = "Organic_carbon", 
                      color = "Potability", 
                      title= "Factors Affecting Water Quality: Organic Carbon")
figure.show()

THMs are chemicals found in chlorine-treated water. Water containing less than 80 milligrams of THMs is considered safe to drink.

In [16]:
figure = px.histogram(data, x = "Trihalomethanes", 
                      color = "Potability", 
                      title= "Factors Affecting Water Quality: Trihalomethanes")
figure.show()

The turbidity of water depends on the number of solids present in suspension. Water with a turbidity of fewer than 5 milligrams is considered drinkable.

In [18]:
figure = px.histogram(data, x = "Turbidity", 
                      color = "Potability", 
                      title= "Factors Affecting Water Quality: Turbidity")
figure.show()

The goal is to analyze the data.